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Abstract; Scholars in the natural sciences rely on historic literature more than any other branch of science. Yet 
much of this material has limited global distribution and much of it is available in only a few select libraries. This 
wealth of knowledge is available only to those few who can gain direct access to significant library collections, a situ- 
ation that is considered one of the chief impediments to the efficiency of research in the field. Community support 
and new technologies led to the formation of the Biodiversity Heritage Library. The BHL is an international collabora- 
tion of natural history libraries working together to make biodiversity literature available for use by the widest possible 
audience through open access and sustainable management. 
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Background 

The biodiversity community is at the forefront of 
developing international standards and applying new 
technologies to merge and expand historic datasets 
with current research, as exemplified by the Biodi- 
versity Heritage Library (BHL) program. The idea 
for this project started in March, 2005, as scien- 
tists, informatics experts, and librarians convened at 
a session entitled “Libraries and Laboratories” hos- 
ted by the Natural History Museum in London to 
share ideas, goals, and concerns. One outcome was 
a shared vision to build an integrated digital biodi- 
versity library modeled after Botanicus, Missouri Bo- 
tanical Garden's digital library. A follow-up organi- 
zational meeting was hosted by the Smithsonian Insti- 
tution Library (SIL) in June of 2006, where librari- 
ans from major natural history, botanical garden, 
and research institutions in the United States and 
Great Britain were invited to participate in a consor- 
tium that would build a global digital collection. All 


of the participants agreed to move forward, with the 
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Missouri Botanical Garden agreeing to support the 
development of the technical infrastructure. By Feb- 
ruary of 2007 a formal organizational meeting was 
hosted by Harvard's Museum of Comparative Zoolo- 
gy, where the governance structure, operational 
plans, and working committees for the BHL were 
formed. Charter members included natural history 
museum libraries ( American Museum of Natural 
History; Field Museum; Natural History Museum, 
London; and Smithsonian Institution ), botanical 
garden libraries ( Missouri Botanical Garden; New 
York Botanical Garden; and Royal Botanic Gardens, 
Kew), as well as academic and research libraries 
(Harvard University’s Botany Libraries; Ernst Mayr 
Library of the Museum of Comparative Zoology; and 
Marine Biological Laboratory/ Woods Hole Oceano- 
graphic Institution Library). Libraries representing 
the Academy of Natural Sciences ( Philadelphia ) 
and California Academy of Sciences joined in 2008. 
The Biodiversity Heritage Library portal ( www. 
biodiversitylibrary. org; see Fig. 1) was officially 
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launched in May, 2007, with Botanicus as the under- 


pinning of a rapidly expanding biodiversity library. 
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Fig. 1 The Biodiveristy Heritage Library portal 


Materials and methods 

Partners in the Biodiversity Heritage Library 
(BHL) program are working together to digitize the 
published literature of biodiversity held in their re- 
spective collections, and to make that literature a- 
vailable for open access as a part of a global “ biodi- 
versity commons”. The BHL program is the scan- 
ning and digitization component of the Encyclopedia 
of Life (EOL) (www. eol. org), Harvard Universi- 
ty’s Professor Edward O. Wilson's vision to create a 
web page for every species of the earth’s biota. An- 


other key collaborator is the Internet Archive (IA ) 


( www. archive. org) , which is dedicated to “ univer- 





sal access to human knowledge” and provides most 
BHL partners with low cost mass scanning, archival 
storage of files, image processing, and technology 
development. Scanning facilities have been opened 
in London, New York, 
D. C. to assist with the BHL and other scanning 
projects. The IA also allows the BHL to “ 


other natural history content contributed by non-BHL 


Boston , and Washington, 


: 29 
ingest 


partners like the California Digital Library, the Uni- 
versity of Illinois at Urbana-Champaign, the Univer- 
sity of Toronto, and the Boston Library Consortium. 
The partnership enriches the BHL collection and le- 


verages limited scanning dollars. 
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The Biodiversity Heritage Library is not a legal 
entity, but a federation of libraries bound by memo- 
randa of understanding. Members have signed agree- 
ments and the library directors represent their re- 
spective institutions on an Institutional Council. An 
elected Executive Committee conducts routine busi- 
ness, and works closely with the three salaried posi- 
tions that include an executive director, a technical 
director, and a collections coordinator. There are 
weekly teleconferences scheduled by the Executive 
Committee , monthly calls scheduled with the Institu- 
tional Council, and there is one face-to-face meeting 
held each year. These two groups oversee policy and 
funding decisions, while the details are managed by 
a variety of broadly representative committees. The 
scanning staff members teleconference weekly by 
phone and have been instrumental in developing the 
tools that manage bidding, workflow, and quality 
control protocols. A collections committee monitors 
the overall cohesiveness of BHL content, refines in- 
gest criteria, and reviews all collections-related is- 
sues. An active technical committee designs all as- 
pects of the BHL global infrastructure, explores and 
engages in partnerships that will advance the project’ 
s mission. The project is supported by the Encyclo- 
pedia of Life budget with grants from the John D. 
and Catherine T. MacArthur Foundation and the Al- 
fred P. Sloan Foundation, funds from EOL’'s five 
anchor institutions, the partner institutions, and oth- 
er grants. 

The BHL consortium is working with the global 
taxonomic community, rights holders and other inter- 
ested parties to ensure that this biodiversity heritage 
is available to all and contributes to the International 
Convention on Biological Diversity (CBD) and the 
Global Biodiversity Information Facility ( GBIF ). 
“Taxonomic intelligence” is the inclusion of taxo- 
nomic practices, skills and knowledge within infor- 
matics services to manage information about organ- 
isms. Dubbed the Universal Biological Indexer and 


Organizer, or uBio, BHL is using a sophisticated al- 


gorithm to locate likely name strings in OCR text, 
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has “ discovered” 10.7 million name strings in 
NameBank (Fig. 2) (www. ubio. org/index. php? pa- 
gename =namebank ) , and serves as a name thesaurus. 
The link between the name service and the BHL col- 


lection creates a powerful new tool for scholars. 
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Fig.2 NameBank portal 


Systematists and taxonomists need access to the 
historic literature to support current research. The 
cited half-life of publications in taxonomy and the 
“decay rate” are longer than in any other scientific 
discipline so the “current” biodiversity literature 
spans more than 250 years. At the outset of the pro- 
ject, the BHL needed to calculate the scope of the 
biodiversity domain. OCLC ( www. oclc. org/us/en/ 
default. htm) , the major international library utility , 
supported the BHL by merging all of the partners’ 
catalog records into a database so OCLC’'s collection 
analysis tool could be used to profile the overall col- 
lection. The outcome showed that biodiversity litera- 
ture is represented by 1.3 million catalog records. 
More than 800 000 records describe monographs and 
40 000 records describe journal titles, with 12 500 
records representing current titles. About forty per- 
cent of the material was published prior to 1923, 
generally placing it the public domain. Sixty-three 
percent of the records were for works in English, 
and German was the second most frequent language 
(9% ). The BHL’s scanning efforts have focused on 
the pre-1923 content that is not readily available, 


and yet essential to taxonomists’ research. 

The technical team developed tools to coordi- 
nate scanning efforts and avoid duplication. A mer- 
ged database of members’ serials holdings was crea- 
ted as a “bid list” so that each library can indicate 
the titles it intends to scan. If problems are discov- 
ered, such as missing volumes, or pages, or illus- 
trations, then a call goes out to other libraries to 
scan those volumes. Monographs are selected by 
each library along subject areas, and the BHL col- 
lection is checked prior to scanning to avoid duplica- 
tion. All items are barcoded and shipping manifests 
are created using a tool called WonderFetch ( biodi- 
versitylibrary. blogspot. com/2008/06/ wonderfetchtm- 


ia-metaxml-fields. htm). The partner libraries can 





populate fields with data that would not normally be 
populated as part of the standard IA process, and 
then store those values alongside each scanned item 
in the IA repository. The impetus for implementing 
WonderFetch was not just to automate the inclusion 
of essential data elements like the volume and issue 
information for serials, but to also capture due dili- 
gence, rights, and licensing information related to 
each item. Partner libraries underwrite all of the 
costs associated with identifying, processing, and 
shipping materials, and BHL grants support the costs 


associated with scanning and digital processing. 


Results and Discussion 

The BHL portal currently offers more than 44 
000 titles represented by nearly 86 000 volumes de- 
livering more than 32 million pages of content. Users 
can search by simply browsing by author, title, or 
subject, or can use the novel language, year of pub- 
lication, and source map options (Fig.3). More re- 
fined searches can be achieved by using the search 
box that allows the user to search for a specific au- 
thor, title, subject, or species names. It is the de- 
livery of search results that is unique to BHL. Spe- 
cies names results are delivered as a bibliography 
that cites the source title, author, date, and pages, 


and includes a link to the NameBank record. The 
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pages of any volume selected are automatically 
scanned by the uBio search feature for species 


names. “Names on this 


The results appear in the 
page” box on the lower left-hand corner of the 
screen (Fig. 4). Links to EOL species pages are 
highlighted, and clicking on any of the discovered 
names will generate a species name bibliography. 
Searchers can click on any name in the uBio box to 
see a bibliography of all other occurrences of the 
name in the entire portal. For example, selecting 
Rhododendron indicum will generate the bibliography 
shown in Fig. 5 that includes links to all source ma- 
terials. 


The “Download/ About this book” 
(Fig.6) when a title is displayed. Users are able to 


tab appears 


33 48 
download the bibliographic record, selected pages, 


The menu also fea- 


and links to 


images, or the entire volume. 
tures PDF or OCR download options, 
views via other portals. In order to download select- 
ed pages, users supply an email address, and a cita- 
tion for the request, and then select up to one hun- 
dred pages. These documents are retained when ap- 
propriate metadata is provided and are made availa- 
ble to other users through CiteBank (citebank. org). 
Citebank (Fig.7) is still under development, but in 
addition to saving the BHL-selected documents, it is 
intended to provide robust search and browse capa- 
bilities to biodiversity publications stored in multiple 


international repositories and aggregate content from 


as many systems as possible, so that biodiversity 
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Fig.3 Portrait and title page in Ernest H. Wilsons A Naturalist 
in Western China. (London; Methuen, 1913 ) 


Sie 1 
We ama 





De i yw guy ponis Took teb 


GD- C X a E upia dodaRyitry oga 


Unsorted tookzrhe LÀ] Nost Voted @ Getting parted [Z cotomen inte |) teach BY rreo titrat |) Wonders Machatplace L] Windom rete BF) 


SAI P Dae) Amdo 














o!l- 0 ose | e 8-O- a -8-B-«-@- a $55) 
= » 6 = 
BS http://www biod .age/10/mode/2up + = et" pÀ ¢ - N r 
& Feedbock | nous | Toat | Tesdele | HL Reerdiers | Copedgta] Coine 
r ¢ = TOn roman | 
ON otter Heritage Library Seann | Ab Cerogones MB Goz) 
e ~ Stevi ite 
Browse By: Ties |Aunhors |Sxuiechi Mamas liaoi Yaa  — Pubaskadin| GrLanguape) m For! (at Cormb.tors) ~ 
Paps a ohita A eat alist ia western China, ath vascelum, camera, and gux |v. = Dowmoadiiboulihis book & 











Book contributed by trw York Hotynical Gargen 








Fig.4 Species names discovered by uBio appear in box in lower 


left corner. Note the links to EOL species pages 
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researchers have a single point of access to published 
materials. CiteBank is also intended to provide a 
storage platform for articles and documents that are 
digitized, but not yet online and offer a common sys- 
tem for researchers to share specialized bibliogra- 
phies. Users will be able to upload, edit, and share 
their own personal lists of references and citations, 
and these references will be linked to scanned con- 
tent in the Biodiversity Heritage Library portal. In 
addition, BHL offers to scan professional societies’ 
publications or other publishers content at BHL’s ex- 
pense and integrate the content into the BHL. A- 
greements with nearly forty publishers have added a- 
bout one hundred titles to the collection. 

The Biodiversity Heritage Library wiki (Fig. 8 ) 
( biodivlib. wikispaces. com) presents a wealth of in- 
formation about the project, detailed instructions and 
tutorials for using the various features, and lists of 
members and BHL staff. Developers’ tools are de- 
scribed and documented, and the BHL’ s opt-in 
copyright feature is explained. The BHL also uses 
popular social media tools to connect with the pub- 
lic, including Twitter, Facebook, and the BHL blog 
( biodiversitylibrary. blogspot. com). Each site at- 
tracts and supports a varied community of scientists 
and the general public. 

Interest and support in the BHL has grown at an 
astonishing rate. In less than five years the BHL has 
grown into an international partnership that mirrors 
the global nature of biodiversity research. Formal 
BHL agreements are in place in Europe, China, and 
Australia, and there is strong interest in South A- 
merica and Egypt. In Europe, colleagues at twenty- 
eight European institutions have obtained funding 
from the European Union eContentplus Program to 
establish a BHL-Europe ( www. bhl-europe. eu), 
which is developing the technical infrastructure and 
tools to deliver content from many scanning projects 
throughout the continent. In the United Kingdom 
work is also proceeding via the BHL and Europeana 
(Fig.9) (www. europeana. eu/portal). In China, 
the Chinese Academy of Sciences supports BHL-Chi- 


na (Fig. 10) (www. bhl-china. org/cms/en) , and 
the Internet Archive installed a scanner in Beijing in 
the summer of 2010 to help build the BHL-China 
collection. In Australia, The Atlas of Living Aus- 
tralia, funded by the Australian government’s Na- 
tional Collaborative Research Infrastructure Strategy 
program joined BHL in June 2010 ( www. ala. org. 
au). Additional partnerships, policies, tools, and 
tutorials are being explored and developed to refine 
the BHL to increasingly extend its global reach. 

A great deal has been achieved through conven- 
tional mass scanning technologies and practices, but 
a significant portion of early biodiversity literature is 
quite rare and valuable, sometimes fragile, and of- 
ten the book is too large or has folded maps or illus- 
trations that do not fit on conventional scanning 
beds. A planning grant, Retooling Special Collec- 
tions in the Age of Mass Digitization, awarded by the 
Institute of Museum and Library Services (IMLS) in 
2008 allowed BHL partners to identify and develop a 
cost-effective and efficient large-scale digitization 
workflow and to explore ways to enhance metadata 
for library materials that are designated as “ special 
collections.” The group held a series of meetings, 
communicated by email, and established a wiki to 
record meetings, track progress, and share docu- 
ments about costs, statistics and workflows, and 
small-scale scanning tests. The report included ex- 
tensive cost analyses and recommendations for e- 
quipment configurations to scan rare and oversized 
materials. 

BHL partners are also exploring ways to intro- 
duce other essential content to the BHL portal. Col- 
lectors’ field notes, plant lists, and diaries often 
hold important information that supplements content 
found on specimen labels and published accounts. 
Access to this primary source material is even more 
problematic to scholars because most archival collec- 
tions, if catalogued, are not described in very fine 
detail. The United States National Herbarium and 
Smithsonian Institution Archives have received a 
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Fig.9 Europeana portal 


Grant, from the Council on Library and Information 
Resources ( CLIR ) to catalog all the field books, 
unpublished journals, loose notes, and sketches that 
document field research related to all disciplines of 
biology. The grant, Exposing Biodiversity Field 
Books and Original Expedition Journals at the Smith- 
sonian Institution, will also will build a cataloging 
tool to and create a central repository so that other 
institutions can contribute their holdings. The en- 
hanced level of description will improve access to 
these important research materials that are frequently 
difficult to discover and access remotely. 

Several BHL partners have been awarded an 
IMLS grant as a companion grant to the Smithsonian's 


CLIR proposal. Connecting Content; A Collaboration 
to Link Field Notes to Specimens and Published Liter- 
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Fig. 10 BHL-China portal 


ature will develop a system for integrating biological 
researchers’ field and specimen notes with museum 
specimens and related electronically published litera- 
ture. The enhanced and integrated access to biologi- 
cal data will serve a wide variety of users, and will 
connect to other ongoing projects such as the Biodi- 
versity Heritage Library. 

The Biodiversity Heritage Library will soon ben- 
efit from another new collaboration. The Internation- 
al Association of Plant Taxonomists (IAPT) has giv- 
en their permission to rescan and integrate with the 
BHL the monumental fifteen volume botanical bibli- 
ography, Taxonomic Literature, 2" ed. ( TL-2). 
The Smithsonian Institution Library has been awar- 
ded an Atherton Seidell Grant to accomplish the 


scanning and design the schema. The BHL envisions 
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a dynamically linked “TL-3” that will connect cita- 
tions to published references and allow for correc- 
tions and the addition of new and expanded content. 
The Biodiversity Heritage Library has achieved 
remarkable success in its relatively short existence. 
The partners have demonstrated that independent 
and geographically dispersed institutions can collabo- 
rate effectively, and have proven their ability to gen- 
erate significant financial support. The technical ac- 
complishments by a small team of talented and dedi- 
cated informatics specialists and the efficient and 
collegial intra-institutional working groups are appar- 
ent in the array of tools and services currently deliv- 
ered and under development via the various BHL in- 
terfaces. On June 27, 2010, the American Library 
Association’s Association for Library Collections & 
Technical Services ( ALCTS ) awarded their Out- 
standing Collaboration Citation to the BHL in recog- 
nition of their outstanding collaborative partnership. 
The project has generated excitement in the in- 
ternational community and many opportunities to de- 
velop new partnerships and sources of funding. Soci- 
ety journal publishers are enthusiastic about partici- 
pation in the BHL opt-in copyright model. The por- 
tal has recorded nearly 1.5 million visits since Janu- 
ary of 2008 , the taxonomic intelligence tool is highly 
effective, and there are high levels of OCR accuracy 
in late 19th and 20th century printing. However, 


the Biodiversity Heritage Library faces many challen- 
ges in the near future. Initial sources of funding end 
in 2012, and a plan for financial and digital sustain- 
ability must be formulated. The rapid international 
expansion of BHL presents new governance issues, 
increases the need for clear and focused standards, 
and strategies to avoid duplication of effort. BHL is 
working to ensure the technical infrastructure for de- 
livering and preserving content through digitization 
and retrospective ingestion, as well as the ability to 
continue to deliver new services as needed by the 


community. 
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